Predicting food crises using news streams

Anticipating food crisis outbreaks is crucial to efficiently allocate emergency relief and reduce human suffering. However, existing predictive models rely on risk measures that are often delayed, outdated, or incomplete. Using the text of 11.2 million news articles focused on food-insecure countries and published between 1980 and 2020, we leverage recent advances in deep learning to extract high-frequency precursors to food crises that are both interpretable and validated by traditional risk indicators. We demonstrate that over the period from July 2009 to July 2020 and across 21 food-insecure countries, news indicators substantially improve the district-level predictions of food insecurity up to 12 months ahead relative to baseline models that do not include text information. These results could have profound implications on how humanitarian aid gets allocated and open previously unexplored avenues for machine learning to improve decision-making in data-scarce environments.


S1 Robustness checks
In this section, we present some robustness checks showing that news factors consistently improve the predictions of food insecurity for a wide range of specifications.

S1.1 Alternative model specifications
Our preferred specification is the random forest regression model described in equation (2). Zscoring all the input variables has no significant impact of the predictions according to a Diebold-Mariano test (p-value = 0.0619). We also tried estimating an OLS and a Lasso regression instead of the random forest ( Fig S10A). Compared to a RMSE of 0.0819 obtained for random forest regression with traditional+news factors, a Lasso regression leads to a significantly higher RMSE of 10.002 (p-value = 0.0373) with 1,949 news factors, and of 10.012 with 167 news factors (pvalue = 0.0455). An OLS regression with 167 factors also leads to a significantly higher RMSE of 0.0912 (p-value = 0.0483).

S1.3 Alternative text features
The 167 news factors included in the models presented in Fig. 3  Taken together, these ablation studies indicate that all the steps of our method to discover relevant keyphrases are necessary to obtain large reductions in RMSE.

S1.4 Geolocating the news
Each news indicator is constructed by counting the cooccurrences of a text feature and geographic mentions. However, naive string matching of country, province, or district names could lead to false positives. For example, an article could be mentioning the text feature "conflict" and the country "Nigeria" even if no conflict is happening in Nigeria. To reduce the chance of false positives, we try a more conservative approach in which we only considered geographic units mentioned in the same sentence as a text feature. While the conservative approach is expected to reduce false positives, it could lead to more false negatives when, for example, the true location of an event is mentioned in a neighboring sentence. In practice, the conservative approach slightly increases the RMSE of the traditional+news model to 0.0928 (Diebold Mariano test, p-value = 0.0351), which suggests that occasional misclassifications of the locations where an event is occurring do not have much incidence on the results (Fig. S10B).

S1.5 Intensity of reporting
Measuring the proportion of news articles mentioning a text feature allows us to account for the intensity of reporting relative to the overall coverage that a district is receiving. In some cases, there could be a bias towards underreporting events, for example when an authoritarian regime controls the media, or overreporting events which are more headline-grabbing. As a robustness check, we try replacing each news indicator with a binary indicator equal to one if at least one article mentions a text feature in a month and zero otherwise. It degrades the RMSE of the traditional+news model to 0.1043 (Diebold Mariano test, p-value = 0.0261), which confirms that considering the multiplicity of articles mentioning a text feature is warranted (Fig. S10B).
Finally, we also tested whether the volume of news is predictive of food insecurity (Fig. S10B).
We observe large variations in the volume of news across districts and over time, which prompted us to construct our news factors by counting the number of articles containing a text feature and normalizing by the volume of news within each district. In theory, one could assume news coverage going up or down as a crisis unfolds depending on the context. In practice, we find that     semantic parsing, we find 5,228 candidate features mentioned in the news and with a word mover's distance to an original feature smaller than 10. After ranking candidate features by increasing distance to a seed and partitioning them into 50 groups of equal size, we report the proportion of candidate features within each group passing the Granger causality test (y-axis) and the average distance to an original feature within each group (x-axis). As the distance to a seed gets close to 6, the proportion of candidate features predicting the IPC phase approaches zero, providing support to our choice of exploring the space of semantic neighbors up to a distance of 6.

Fig. S7. Cross-validation.
A timeline describing our cross-validation methodology. We temporally split the observation period into 10 folds. Each fold is temporally split into training, validation, and test periods. We iteratively train the model on the training period of each fold. We then evaluate the RMSE on the validation period for each combination of hyperparameters, and we find the hyperparameters which minimize the RMSE on the validation period. Finally, we compute the RMSE on the test period using the optimal hyperparameters and we report the unweighted average RMSE across the test periods of the 10 folds. insecurity at 3, 6, 9 and 12-month horizons from expert forecasts -unavailable at 9-and 12-month horizons -and using random forest regressions estimated on the 21 countries for which expert forecasts, traditional and news factors are available over the period July 2009 to July 2020. We ensure that no observation from the training period is used to evaluate a model's performance.
These results demonstrate that news indicators also improve the prediction of food insecurity up to twelve months ahead.  We compare the predictive performance of the random forest model with traditional + news factors with the alternative specifications for the set of text features described in section S1.3-S1.5. (C) Rsquared and adjusted R-squared of each model measured on the test set. (D) We report the area under the precision-recall curve (AUC) and its standard error (34) for each classification model of food crisis outbreaks (in column) and for different definitions of an outbreak (in row). We also report the recall of each model at different precision levels (in row). For both metrics, the row showing our preferred specification is highlighted in bold. These results demonstrate that including news indicators consistently improves the traditional model's predictions.